Skip to content

Conversation

@cpaasch-oai
Copy link
Contributor

While working on additional tests for nexthop encapsulation, I realized there are some bugs in certain places across libnl.

I'm addressing quite a few of these with this PR.

Tests for each of these would be great, but it takes a bit more time. Thus, I fix it first and will add tests if I have time.

ipvti_mask is set to IPVTI_ATTR_OKEY when ipvti->okey is set. Thus, we
need to check for the correct attribute.

Signed-off-by: Christoph Paasch <[email protected]>
We need to clone rtnh_realms as well.

Signed-off-by: Christoph Paasch <[email protected]>
IFLA_GENEVE_PORT expects a u16, not u32.

Signed-off-by: Christoph Paasch <[email protected]>
ipgre_mask is set to IPGRE_ATTR_IFLAGS/OFLAGS, not
IFLA_GRE_IFLAGS/OFLAGS. Fix this.

Also, ipgre->local/remote are uint32_t, thus we need to use ntohl, not
ntohs.

Signed-off-by: Christoph Paasch <[email protected]>
The kernel expects a u16 for vxi_port, so use NLA_PUT_U16.

The vxlan_compare function is checking on vxi_proxy for too many
fields...
Fix that!

Signed-off-by: Christoph Paasch <[email protected]>
sit/ipip/ipvti->local/remote are uint32_t.
We need to use ntohl instead of ntohs.

Signed-off-by: Christoph Paasch <[email protected]>
When _ATTR_LINK is set, the link needs to be looked up before printing.
Similar as is done for ip6tnl.

So, we need to fix this in ip6gre and ip6vti.

TODO - it looks like we are leaking parent here.... I guess I need to
nl_auto free it... to be confirmed!!!

Signed-off-by: Christoph Paasch <[email protected]>
When dumping and the *_ATTR_LINK flag is present, we do a call to
link_lookup() which ends up getting a references on the link object.

Thus, if we want to avoid leaking things, we need to actually
rtnl_link_put the object. Do this with _nl_auto_rtnl_link.

Signed-off-by: Christoph Paasch <[email protected]>
name = rtnl_link_get_name(link);

name = NULL;
parent = link_lookup(link->ce_cache, ip6gre->link);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that link_lookup() returns a reference. So you would need _nl_auto_rtnl_link.

On the other hand, link_lookup() is internal API only (it can change). This API shouldn't return a reference/ownership, that would only make sense if cache and nl_object_get() were thread-safe and could be filled at the same time by other threads. But it is not, so nobody is going to unref the returned object unexpectedly. The callers just need to take care that they do nothing with the looked up link, after it might be destroyed (or take a reference here).

I would even go a step further and add a link_lookup_name() helper (that returns a const char *) and use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are leaking objects. 75352b7 intended to fix that.

If you rather prefer to change link_lookup to not take the ref and/or add link_lookup_name, I can do that !

nl_dump(p, " local ");
if(inet_ntop(AF_INET, &ipip->local, addr, sizeof(addr)))

if (inet_ntop(AF_INET, &ipip->local, addr, sizeof(addr)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking the return value for inet_ntop() is wrong. This function can only fail, if you pass invalid address/family or buffer size, which the caller should just not do. There is _nl_inet_ntop() for that reason.

I mean, if libc is not able to convert an u32 / struct in6_addr to a string buffer (of sufficient size), then we should reimplement it. But of course, libc is not going to fail here. Ever.

And the else path is just unreached code and noise and place for bugs.

But the patch is good. Maybe I will drop some inet_ntop() later -- unless you want to do it first :) Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I just blindly fixed whitespace here without actually reading what the purpose of inet_ntop here is 😅

@thom311 thom311 merged commit 9bb47cc into thom311:main Aug 7, 2025
4 checks passed
@thom311
Copy link
Owner

thom311 commented Aug 7, 2025

lots of important fixes here. Thank you. Merged.

about inet_ntop() and link_lookup(), I will do something. Hold on.

@thom311
Copy link
Owner

thom311 commented Aug 7, 2025

about inet_ntop, I did 5eeee28. I think this is fine.

About link_lookup(). If link->ce_cache is empty, then it falls back to a global cache. Which is in not thread-safe, for several reasons. That is bad, but without a larger fix for that, I am hesitant to no longer return a reference from link_lookup(). Of course, the refererence counting is also not thread-safe, but maybe there is an application out there that can exhibit that race, and no longer taking the reference count makes it works.

TODO, but making it all thread-safe is more effort. Normally, it would be fine that a nl_cache or a nl_object is not thread-safe. But once you hook them up as a global state where multiple threads can access them, it becomes a problem. Maybe this global cache should be thread-local. That might be the simplest solution, that does not require to add atomics and locks at many places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants